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-We  develop:  a  general  theory  for  variance  function  estimation  in 


regression.  Most  methods  in  coaaon  use  are  Included  in  our  development. 
The  general  qualitative  conclusions  are  these.  First,  most  variance 


function  estiaation  procedures  can  be  looked  upon  as  regressions  with 

r  9- 

’responses  being  transformations  of  absolute  residuals  froa  a  preliainary 


fit  or  sample  standard  deviations  fron  replicates  at  a  design  point.  Our 
conclusion  is  that  the  foraer  is  typically  more  efficient,  but  not  uniformly 
so.  Secondly,  for  variance  function  estimates  based  on  transformations  of 


absolute  residuals,  we  show  that  efficiency  is  a  monotone  function  of  the 
efficiency  of  the  fit  from  which  the  residuals  are  formed,  at  least  for 
symmetric  errors.  Our  conclusion  is  that  one  should  iterate  so  that  the 

residuals  are  based  on  generalized  least  squares.  Finally,  robustness 
issues  are  of  even  more  importance  here  than  in  estimation  of  a_ regression 
function  for  the  mean.  The  loss  of  efficiency  of  the  standard  method  away 

from  the  normal  distribution  is  much  more  rapid  than  in  the  regression 

/  ' 

problem. 
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Consider  a  heteroscedastic  regression  model  for  observable  data  Y: 

(1.1)  EYi  =  iJi  =  f(xr^);  Var  (YJ )  -  oVUj  ,p,9 ) . 

Here,  {x^k  x  1)}  are  the  design  vectors,  /3(p  x  1)  is  the  regression  parameter, 
f  is  the  mean  response  function,  and  the  variance  function  g  expresses  the 
heteroscedasticity ,  where  (z^(f  x  1)}  are  known  vectors,  possibly  the  {x^},  a 
is  an  unknown  scale  parameter,  and  9(r  x  1)  is  an  unknown  parameter.  Models 
which  may  be  regarded  as  special  cases  of  (1.1)  are  used  in  diverse  fields, 
including  radioimmunoassay,  econometrics,  pharmokinetic  modeling,  enzyme 
kinetics  and  chemical  kinetics  among  others.  The  usual  emphasis  is  on 
estimation  of  p  with  estimation  of  the  variances  as  an  adjunct. 

The  most  common  method  for  estimating  p  is  generalized  least  squares,  in 
which  one  estimates  g(z^,/J,9)  by  using  an  estimate  of  9  and  a  preliminary 
estimate  of  p  and  then  performs  weighted  least  squares;  see,  for  example, 
Carroll  and  Ruppert  (1982a)  and  Box  and  Hill  (1974).  This  might  be  iterated, 
with  the  preliminary  estimate  replaced  by  the  current  estimate  of  p,  a  new 
estimate  of  9  obtained  and  the  process  repeated.  Standard  asymptotic  theory  as 
in  Carroll  and  Ruppert  (1982a)  or  Jobson  and  Fuller  (1980)  shows  that  as  long 
as  the  preliminary  estimators  for  the  parameters  of  the  variance  function  are 
consistent,  all  estimators  of  p  obtained  in  this  way  will  be  asymptotically 
equivalent  to  the  weighted  least  squares  estimator  with  known  weights. 

There  is  evidence  that  for  finite  samples,  the  better  one's  estimate  of  9, 
the  better  one's  final  estimate  of  p .  Williams  (1975)  states  that  "both 
analytic  and  empirical  studies ...  indicate  that... the  ordering  of  efficiency  (of 
estimates  of  />)... in  small  samples  is  in  accordance  with  the  ordering  by 


efficiency  (of  estimates  of  0)."  Rothenberg  (1984)  shows  via  second  order 
calculations  that  if  g  does  not  depend  on  p,  when  the  data  are  normally 
distributed  the  covariance  matrix  of  the  generalized  least  squares  estimator  of 
p  is  an  Increasing  function  of  the  covariance  matrix  of  the  estimator  of  0. 

Second  order  asymptotics  provide  only  a  weak  justification  for  studying 
the  properties  of  variance  function  estimates.  Instead,  our  thesis  is  that 
estimation  of  the  structural  variance  parameter  9  is  of  Independent  interest. 
In  many  engineering  applications,  an  important  goal  is  to  estimate  the  error 
made  in  predicting  a  new  observation;  this  can  be  obtained  from  the  variance 
function  once  a  suitable  estimate  of  0  is  available.  In  chemical  and 
biological  assay  problems,  issues  of  prediction  and  calibration  arise.  In  such 
problems,  the  estimator  of  0  plays  a  central  role;  the  statistical  properties 
of  prediction  intervals  and  calibration  constructs  such  as  the  minimal 
detectable  concentration  will  be  highly  dependent  on  how  one  estimates  0 ;  see 
Carroll.  Davidian  and  Smith  (1986).  In  off-line  quality  control,  the  emphasis 
is  not  only  on  the  mean  response  but  also  on  its  variability;  Box  and  Meyer 
(1986)  state  that  "one  distinctive  feature  of  Japanese  quality  control 
improvement  techniques  is  the  use  of  statistical  experimental  design  to  study 
the  effect  of  a  number  of  factors  on  variance  as  well  as  the  mean.".  Effective 
estimation  of  variance  functions  could  play  a  major  role  in  this  application. 
It  should  be  evident  from  this  brief  review  that  far  from  being  only  a  nuisance 
parameter,  the  structural  variance  parameter  0  can  be  an  important  part  of  a 
statistical  analysis. 

The  above  discussion  suggests  the  need  for  a  unified  investigation  of 
estimation  of  variance  functions,  in  particular,  estimation  of  the  structural 
parameter  0.  Previous  work  in  the  literature  tends  to  treat  various  rpocial 
cases  of  (1.1)  as  different  models  with  their  own  estimation  methods.  The 
intent  of  this  paper  is  to  study  parametric  variance  function  estimation  in  a 


unified  way.  Nonparametric  variance  function  estimation  has  also  been  studied, 
see  for  exawple  Carroll  (1982);  we  will  confine  our  study  to  the  paraaetric 
setting. 

Paraaetric  variance  function  estimation  nay  be  thought  of  as  a  type  of 
regression  problem  in  which  we  try  to  understand  variance  as  a  function  of 
known  or  estimable  quantities,  and  in  which  8  plays  the  part  of  a  "regression" 
parameter.  The  major  insight  which  allows  for  a  unified  study  is  that  the 
absolute  residuals  from  the  current  fit  to  the  mean  or  the  sample  standard 
deviations  from  replicates  are  basic  building  blocks  for  analysis.  At  the 
graphical  level,  this  means  that  transformations  of  the  absolute  residuals  and 
sample  standard  deviations  can  be  used  to  gain  insight  into  the  structure  of 
the  variability  and  to  suggest  paraaetric  models.  For  estimation,  a  major 
contribution  is  to  point  out  that  most  of  the  methods  proposed  in  the 
literature  are  (possibly  weighted)  regressions  of  transformations  of  the  basic 
building  blocks  on  their  expected  values.  Many  exceptions  to  this  are  dealt 
with  in  this  article  as  well. 

Our  study  yields  these  major  qualitative  conclusions.  As  stated  here, 
they  apply  strictly  only  to  symmetric  error  distributions,  but  they  are  fairly 
definitive  and  one  is  unlikely  to  be  too  successful  ignoring  them  in  practice. 
Our  first  conclusion  is  that  robustness  plays  a  great  role  in  the  efficiency  of 
variance  function  estimation,  probably  even  greater  than  in  estimation  of  a 
mean  function.  For  example,  if  the  variance  does  not  depend  on  the  mean 
response,  the  standard  method  will  be  normal  theory  maximum  likelihood  as  in 
Box  A  Meyer  (1986).  A  weighted  analysis  of  absolute  residuals  yields  an 
estimator  only  12X  less  efficient  at  the  normal  model,  and  with  a  large  slope 
of  improvement  for  heavy  tailed  distributions.  This  slope  of  improvement  is 
much  larger  than  is  typical  in  regression  on  means.  For  a  standard 
contaminated  normal  model  for  which  the  best  robust  estimators  have  efficiency 


125%  with  respect  to  least  squares,  the  absolute  residual  estimator  of  the 
variance  function  has  efficiency  200%. 

Our  second  conclusion  concerns  the  fit  to  the  means  upon  which  the 
residuals  are  based.  It  has  been  our  experience  that  unweighted  least  squares 
residuals  yield  unstable  estimates  of  the  variance  function  when  the  variances 
depend  on  the  mean.  This  is  confirmed  in  our  study,  in  the  sense  that  the 
asymptotic  efficiency  of  the  variance  function  estimators  is  an  increasing 
function  of  the  variability  of  the  current  fit  to  the  means.  Thus,  we  suggest 
the  use  of  iterative  weighted  fitting,  so  that  the  variance  function  estimate 
is  based  on  generalized  least  squares  residuals.  As  far  as  we  can  tell,  this 
part  of  our  paper  is  one  of  the  first  formal  justifications  for  iteration  in  a 
generalized  least  squares  context. 

It  is  standard  in  many  applied  fields  to  take  m  replicates  at  each  design 
point,  where  usually  m  <  4.  Rather  than  using  (transformations  of)  absolute 
residuals  for  estimating  variance  function  parameters,  one  might  use  the  sample 
standard  deviations.  Our  third  conclusion  involves  the  efficiency  of  this 
substitution,  for  which  we  develop  an  asymptotic  theory.  The  effect  is 
typically,  although  not  always,  a  loss  of  efficiency,  at  least  when  there  are  m 
<  4  replicates.  The  clearest  results  occur  when  the  variance  does  not  depend 
on  the  mean.  Normal  theory  maximum  likelihood  is  a  weighted  regression  of 
squared  residuals;  the  corresponding  method  would  be  a  weighted  regression 
based  on  sample  variances.  Using  the  latter  entails  a  loss  of  efficiency,  no 
matter  what  the  underlying  distribution.  For  normally  distributed  data,  the 
efficiency  is  (m-l)/m,  thus  being  only  50%  for  duplicates.  For  other  methods, 
using  the  replicate  standard  deviations  can  be  more  efficient.  This  is 
particularly  true  of  a  method  due  to  Harvey  (1976),  which  is  based  on  th -i 
logarithm  of  absolute  residuals.  A  small  absolute  residual,  which  seems  to 
always  occur  in  practice,  can  wreak  havoc  with  this  method.  This  is  consistent 


with  our  Influence  function  calculations,  so  that  we  suggest  some  trimming  of 


the  smallest  absolute  residuals  before  applying  Harvey's  method. 

In  Section  2  we  describe  a  number  of  methods  for  estimation  of  9  .  We 
confine  our  attention  to  methods  which  are  in  common  use;  in  particular,  we  do 
not  discuss  robust  methods,  see  Giltinan,  Carroll  and  Ruppert  (1986).  In 
Section  3  we  present  an  asymptotic  theory  for  a  general  estimator  of  9  whose 
construction  encompasses  the  methods  of  Section  2.  Section  4  contains  examples 
of  specific  applications  of  our  theory  and  a  discussion  of  the  implications  of 
our  formulation.  Sketches  of  proofs  are  presented  in  Appendix  A. 


2.  ESTIMATION  OF  • 

We  now  discuss  the  form  and  motivation  for  several  estimators  of  9  in 

(1.1) .  In  what  follows,  let  /J#  be  a  preliminary  estimator  for  p.  This  could 

be  unweighted  least  squares  or  the  current  estimate  in  an  iterative  reweighted 
least  squares  calculation.  Let  the  errors  be  given  by  =  {Y^ 

f(xi.P)>/{og(zi./5,0))  and  denote  the  residuals  by  ri  =  Yj  -  f(Xj  ,/>*). 

2.1  Regression  Methods 

Pseudo-likelihood .  Given  p,,  the  pseudo-likelihood  estimator  maximizes 
the  normal  log-likelihood  i (p%,9  ,a) ,  where 

(2.1)  t(p,9,o)  =  -N  log  a  -  r^=1log(g(zi,p,9)) 

-  (2o2)_1Z^1  (Yi-f(xrP))2/g2(z1,p,e). 

see  Carroll  and  Ruppert  (1982a).  Generalizations  of  pseudo-likelihood  for 


robust  estimation  have  been  studied  by  Carroll  and  Ruppert  (1982a)  and 
Giltinan,  Carroll  and  Ruppert  (1986). 

Least  squares  on  squared  residuals.  Besides  pseudo-likelihood,  other 

methods  using  squared  residuals  have  been  proposed.  The  motivation  for  these 

methods  is  that  the  squared  residuals  have  approximate  expectation 
2  2 

a  g  (z j,/J,9),  see  Jobson  and  Fuller  (1980)  and  Aaemiya  (1977).  This  suggests  a 

2 

nonlinear  regression  problem  in  which  the  "responses"  are  (r^)  and  the 

2  2“ 

"regression  function"  is  a  g  (z  ,flm,9).  The  estimator  9  minimizes  in  9  and  a 

1  SK 

2i=1  -  °2b2(z r/5*.e)}2- 

4  4 

For  normal  data  the  squared  residuals  have  approximate  variance  a  g  (z^,/5.0); 
in  the  spirit  of  generalized  least  squares,  this  suggests  the  weighted 

estimator  which  minimizes  in  9  and  a 

(2.2)  2j=1  (r*  -  o2g2(zi,^,0)}2/g4(z1,^,e,), 

where  9m  is  a  preliminary  estimator  for  9,  9  for  example.  Full  iteration, 
when  it  converges,  would  be  equivalent  to  pseudo-likelihood. 

Accounting  for  the  effect  of  leverage.  One  objection  to  methods  such  as 
pseudo-likelihood  and  least  squares  based  on  squared  residuals  is  that  no 
compensation  is  made  for  the  loss  of  degrees  of  freedom  associated  with 

preliminary  estimation  of  /}.  For  example,  the  effect  of  applying 

pseudo-likelihood  directly  seems  to  be  a  bias  depending  on  p/N.  For  settings 
such  as  fractional  factorials  where  p  is  large  relative  to  N  this  bias  could  be 
substantial . 

Bayesian  ideas  have  been  used  to  account  for  loss  of  degrees  of  freedom; 


see  Harville  (1977)  and  Patterson  and  Thompson  (1974).  When  g  does  not  depend 


on  p,  the  restricted  maximum  likelihood  approach  of  the  latter  authors  suggests 
in  our  setting  one  estimate  8  from  the  mode  of  the  marginal  posterior  density 
for  8  assuming  normal  data  and  a  prior  for  the  parameters  proportional  to  a  1 . 
When  g  depends  on  p,  one  may  extend  the  Bayesian  arguments  and  use  a  linear 
approximation  as  in  Box  and  Hill  (1974)  and  Beal  and  Sheiner  (1986)  to  define  a 
restricted  maximua  likelihood  estimator. 

Let  Q  be  the  N  x  p  matrix  with  ith  row  f^J(xi  ,p)t/g(z^  ,p,8 ) ,  where  f^Xj./J) 
=»  d/dp  (f(Xj,/J)},  and  let  H  =  Q(Q^Q)  *(3*  be  the  "hat"  matrix  with  diagonal 
element  11^  =  h^(/3,0);  the  values  {h.^}  are  the  leverage  values.  It  turns  out 
that  the  restricted  maximum  likelihood  estimator  is  equivalent  to  an  estimator 
obtained  by  modifying  pseudo-likelihood  to  account  for  the  effect  of  leverage. 
This  characterization,  while  not  unexpected,  is  new;  we  derive  this  estimator 
and  its  equivalence  to  a  modification  of  pseudo-likelihood  in  Appendix  B. 

The  least  squares  approach  using  squared  residuals  can  also  be  modified  to 
show  the  effect  of  leverage.  Jobson  and  Fuller  (1980)  essentially  note  that 
for  nearly  normally  distributed  data  we  have  the  approximations 

E  r2  *<  o2(  1-h.  A  )g2(z  .  ,p,8  )  , 
var  r2  «  o4(  1-h^  )  2g4  ( z  J,  ,p  ,9  )  , 

To  exploit  these  approximations  modify  (2.2)  to  minimize  in  9  and  o 
(2.3)  Zi  =  l  {ri  -  o2(1-hil)g2(zi.p,,9))2/{(1-hii)2g4(z.  ,p*.e\)}, 

where  h^  =  h^fp^.e*)  and  9*  is  a  preliminary  estimator  for  9.  An 
asymptotically  equivalent  variation  of  this  estimator  in  which  one  sets  the 
derivatives  of  (2.3)  with  respect  to  9  and  a  equal  to  0  and  then  replaces  9 m  by 
9  can  be  seen  to  be  equivalent  to  pseudo-likelihood  in  which  one  replaces 


standardized  residuals  by  studentized  residuals.  While  this  estimator  also 
takes  into  account  the  effect  of  leverage,  it  is  different  from  restricted 
maximum  likelihood. 

Least _ squares  on  absolute _ residuals.  Squared  residuals  are  skewed  and 

long-tailed,  which  has  lead  many  authors  to  propose  using  absolute  residuals  to 
estimate  9;  see  Glejser  (1969)  and  Theil  (1971).  Assume  that 

E|Y1  -  f  (x±  ,/5 )  |  =  rjg(zi,p,9), 

which  is  satisfied  if  the  errors  {t^}  are  independent  and  identically 

distributed.  Mimicking  the  least  squares  approach  based  on  squared  residuals, 

one  obtains  the  estimator  6  by  minimizing  in  rj  and  9 

AR 

zi=i  {lrJ  ~  nz{zi  p*’9)}2- 

In  analogy  to  (2.2),  the  weighted  version  is  obtained  by  mimimizing 

Xi  =  l  ^  I  ri  1  "  '7gUrA,.8)}2/g2UrP,.e,). 
where  9*  is  a  preliminary  estimator  for  9,  probably  0  .  As  for  least  squares 

estimation  based  on  squared  residuals,  one  could  presumably  modify  this 
approach  to  account  for  the  effect  of  leverage. 

Logarithm  method.  The  suggestion  of  Harvey  (1976)  is  to  exploit  the  fact 
that  the  logarithm  of  the  absolute  residuals  has  approximate  expectation  log 
(og( Zj -P ) } •  Estimate  9  by  ordinary  least  squares  regression  of  log  | r^ |  on 
log  {og(  Zj  ,y£3,  ,9 ) } ,  since  if  the  errors  are  independent  and  identically 
distributed,  the  regression  should  be  approximately  homoscedastic .  If  one  of 
the  residuals  is  near  zero  the  regression  could  be  adversely  affected  by  a 
large  "outlier,"  hence  in  practice  one  might  wish  to  delete  a  few  of  the 


smallest  absolute  residuals,  perhaps  trimming  the  smallest  few  percent. 


9 


K 


Besides 

squares 

and 

logarithms 

of 

absolute 

residuals, 

other 

transformations 

could  be 

used. 

For  example, 

the 

square  root 

and  2/3  root 

would 

typically  be  sore  normally  distributed  than  the  absolute  residuals  themselves. 
Such  transformations  appear  to  be  useful,  although  they  have  not  been  used  much 
to  our  knowledge.  Our  asymptotic  theory  applies  to  such  transformations. 

In  a  parametric  model  such  as  (1.1),  joint  maximum  likelihood  estimation 
Is  possible,  where  we  use  the  term  maximum  likelihood  to  mean  normal  theory 
maximum  likelihood.  When  the  variance  function  does  not  depend  on  ft,  it  can  be 
easily  shown  that  maximum  likelihood  is  asymptotically  equivalent  to  weighted 
least  squares  methods  based  on  squared  residuals.  In  the  situation  in  which 
the  variance  function  depends  on  ft  this  is  not  the  case.  In  this  setting,  it 
has  been  observed  by  Carroll  and  Ruppert  (1982b)  and  McCullagh  (1983)  that 
while  maximum  likelihood  estimators  enjoy  asymptotic  optimality  when  the  model 
and  distributional  assumptions  are  correct,  the  maximum  likelihood  estimator  of 
ft  can  suffer  problems  under  departures  from  these  assumptions.  This  suggests 
that  joint  maximum  likelihood  estimation  should  not  be  applied  blindly. 

Methods  requiring  nr  >  2  replicates  at  each  x^  have  been  proposed  in  the 
assay  literature;  for  simplicity,  we  will  consider  only  the  case  of 
equi-replication  nr  s  m  and  write  in  obvious  fashion  (Y^),  j  =  l,...m,  to 
denote  the  m  observations  at  x^  where  appropriate.  These  methods  do  not  depend 
on  the  postulated  form  of  the  regression  function;  one  reason  that  this  may  be 
advantageous  is  that  in  many  assays  along  with  observed  pairs  (Y^.x^)  there 
will  also  be  pairs  in  which  only  is  observed.  A  popular  and  widely  used 
method  is  that  of  Rodbard  and  Frazier  (1975).  If  we  assume 


(2.4) 


g(zi./J,fl)  =  , 
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the  aethod  Is  Identical  to  the  logarithm  Method  previously  discussed  except 
that  one  replaces  |r^|  by  the  sanple  standard  deviation  s^  and  f(Xj,p,)  in  the 
"regression"  function  by  the  sanple  nean  Y^#  .  As  an  alternative,  under  the 
assunption  of  independence  and  (2.4),  the  Modified  naxinum  likelihood  method  of 
Raab  (1981)  estimates  9  by  joint  naxlnlzation  in  the  (N+r+1)  paraneters 
a2 ,8  ,fj  . of  the  "modified"  normal  likelihood 

(2.5)  ff^1{2no2g2(P1.z1,9))(m"1)/2exp(-r^1(Y1J-Ai1)2/(2o2g2(^1.zre)}] 


L'/.'  U WC  W  W  OKI 


In  this  section  we  construct  an  asymptotic  theory  for  a  general  class  of 
regression-type  estimators  for  8 .  Since  our  major  interest  lies  in  obtaining 
general  insights,  we  do  not  state  technical  assumptions  or  details. 


Write  d^(p)  =  |Y^  -  f(x^,/5)j.  Let  be  a  smooth  function  and  define  H 2  ^ 


H2,i  =  *2A(rj,9,p)  =  E  C  Vdi(/,)}  h 

where  q  is  a  scale  parameter  which  is  usually  a  function  of  o  only.  If  q%,  9^ 
and  are  any  preliminary  estimators  for  q,  9,  and  p,  define  q  and  9  to  be  the 
solutions  of 


(3.1)  N  1/2  Z*ml  H  4il(i7.«.P,)  (H^d^p,))  -  H2tl(i7.9.p,))  /  «3  ^  {q  .8  *  ,p,  )  , 


where  H  Aq  ,8  ,p)  is  a  smooth  function  and  H  .  is  usually  the  partial 

v  i  1  4,1 


derivative  of  H-  .  with  respect  to  (q,8). 

M  |  1 

The  class  of  estimators  solving  (3.1)  includes  directly  or  includes  an 

asymptotically  equivalent  version  of  the  estimators  of  Section  2.1.  For 

methods  which  account  for  the  effect  of  leverage.  H„  . ,  H_  .  and  H.  .  will 

2 , 1  3  >  1  4  f  1 

depend  on  the  h^.  In  this  case  we  need  the  additional  assumption  that  if  h  * 
1/2 

max  (hjj).  then  N  h  converges  to  zero. 

•*  *  1/2 

Theorem  3.1.  Let  qm,  8t  and  pM  be  N  consistent  for  estimating  q,  8  and  p. 
Let  Hj  be  the  derivative  of  Hj  and  define 


ci  •  H4.1  -  "a.i1  /  h3.i: 

B2.»  '  <H4.1/,,3,i)  W  ••/>»! 

3.K  *  <H4.i/B3.1>W/”  E 


Then,  under  regularity  conditions  as  N  -♦  <», 


(3.2)  B1>nN 


1/2 


n  -  n 


8-8 


■  N'1/2li-lCl  *  ,B2.N*B3.N>  *  “p'1* 


We  may  immediately  make  some  general  observations  about  the  estimator  8 
solving  (3.1).  Note  that  if  the  variance  function  does  not  depend  on  p,  then 
H  does  not  depend  on  p  and  hence  B,  K  =  0.  For  the  estimators  of  Section 
2.1,  Hj  is  an  odd  function.  Thus,  if  the  errors  (e^)  are  symmetrically 
distributed,  E[  H1(di(p))sign(fei)  ]  »  0  and  hence  ^  =  0. 


Corollary  3. 1 ( al .  Suppose  that  the  variance  function  does  not  depend  on  p  and 


the  errors  are  symmetrically  distributed.  Then  the  asymptotic  distributions  of 
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the  regression  estimators  of  Section  2.1  do  not  depend  on  the  method  used  to 
obtain  p m.  If  both  of  these  conditions  do  not  hold  simultaneously,  then  the 
asymptotic  distributions  will  depend  in  general  on  the  method  of  estimating  p. 

□ 

The  implication  is  that  in  the  situation  for  which  the  variance  function 
does  not  depend  on  p  and  the  data  are  approximately  symmetrically  distributed, 
for  large  sample  sizes  the  preliminary  estimator  for  p  will  play  little  role  in 
determining  the  properties  of  9.  Note  also  from  (3.2)  that  for  weighted 
methods,  the  effect  of  the  preliminary  estimator  of  9  is  asymptotically 
negligible  regardless  of  the  underlying  distributions. 

The  preliminary  estimator  p%  might  be  the  unweighted  least  squares 
estimator,  a  generalized  least  squares  estimator  or  some  robust  estimator. 
See,  for  example,  Huber  (1981)  and  Giltinan,  Carroll  and  Ruppert  (1986)  for 
examples  of  robust  estimators  for  p.  For  some  vectors  (vN  these  estimators 
admit  an  asymptotic  expansion  of  the  form 

(3.3)  N1/2(/J,  -  p)  =  N"1/2^=1  *(vn  j,^)  +  Op(l). 

Here  V  is  odd  in  the  argument  e. .  In  case  the  variance  function  depends  on  p, 

B„  „  *  0  in  general;  however,  if  the  errors  are  symmetrically  distributed  and 
2 ,  N 

p%  has  expansion  of  form  (3.3),  then  the  two  terms  on  the  right-hand  side  of 
(3.2)  are  asymptotically  independent.  The  following  is  then  immediate. 

Corollary  3.1(b).  Suppose  that  the  errors  are  symmetrically  distributed  and 
that  p%  has  an  asymptotic  expansion  of  the  form  (3.3).  Then  for  the  estimators 
of  Section  2.1,  the  asymptotic  covariance  matrix  of  9  is  a  monotone 
nondecreasing  function  of  the  asymptotic  covariance  matrix  of  p9. 

□ 
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By  the  Gauss-Markov  theorem  and  the  results  of  Jobson  and  Fuller  (1980) 
and  Carroll  and  Ruppert  (1982a),  the  iaplicatlon  of  Corollary  3.1(b)  is  that 
using  unweighted  least  squares  estimates  of  p  will  result  in  inefficient 
estiaates  of  0  .  This  phenoaenon  is  exhibited  in  saall  saaples  in  a  Monte  Carlo 
study  of  Carroll,  Davidian  and  Smith  (1986).  If  one  starts  froa  the  unweighted 
least  squares  estimate,  one  ought  to  iterate  the  process  of  estimating  8  —  use 
the  current  value  p,  to  estlaate  9  froa  (3.1),  use  these  p,  and  0  to  obtain  an 
updated  p,  by  generalized  least  squares  and  repeat  the  process  t  -  1  more 

tiaes.  It  is  clear  that  the  asymptotic  distribution  of  0  will  be  the  same  for 
t  >  2  with  larger  asymptotic  covariance  for  %  -  1,  so  in  principle  one  ought  to 
Iterate  this  process  at  least  twice.  See  Carroll,  Ruppert  and  Wu  (1986)  for 
more  on  iterating  generalized  least  squares. 

3.2  Methods  based  on  sample  standard  deviations 

Assume  at  each  of  M  design  points  we  have  m  >  2  replicate  observations  so 
that  N  =  Mm  represents  the  total  number  of  observations.  Let  (si>  be  the 
sample  standard  deviations,  which  themselves  have  been  proposed  as  estimators 
of  the  variance  in  generalized  least  squares  estimation  of  p.  This  can  be 
disasterous,  see  Jacquez,  Mather  and  Crawford  (1968).  When  replication  exists, 
however,  practitioners  feel  comfortable  with  the  notion  that  the  (s^)  may  be 
used  as  a  basis  for  estimating  variances;  thus,  one  might  reasonably  seek  to 
estimate  0  by  replacing  d^p,)  by  Sj  in  (3.1). 

The  following  result  is  almost  immediate  from  the  proof  of  Theorem  3.1  in 
Appendix  A.  Here  we  let  N  -*  «•  such  that  m  remains  fixed. 

Theorem  3.2.  If  d^(p^)  is  replaced  by  s^  in  (3.1),  then  under  the  conditions 
of  Theorem  3.1  the  resulting  estimator  for  0  satisfies  (3.2)  with  B„  „  =  0  and 

o  ,  N 
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9 


$ 


the  redefinitions 


(3.4a) 


(3.4b) 


Ci  •  "t.i'h.Wi*  -  H2.1): 

M2,i  ‘  E  <HJ(si*>  "  "2.  ,«.«■/»• 


If  the  errors  are  symmetrically  distributed,  then  from  (3.2)  and  Theorem 
3.2,  whether  one  is  better  off  using  absolute  residuals  or  sample  standard 
deviations  in  the  methods  of  Section  2.1  depends  only  on  the  differences 
between  the  expected  values  and  variances  of  H^td^/!)}  and  H^s.).  In  Section 
4  we  exhibit  such  comparisons  explicitly  and  show  that  absolute  residuals  can 
be  preferred  to  sample  standard  deviations  in  situations  of  practical 
importance. 


•WWK'Mm 


We  assume  throughout  this  discussion  that  the  variance  function  has  form 
(2.4)  and  that  m  >  2  replicates  are  available  at  each  Xj .  From  Section  2.1  we 
see  that  the  "regression  function"  part  of  the  estimating  equations  depends  on 
nXj./l*),  so  that  in  the  general  equation  (3.1)  H2  H3  ^  and  H4  ^  all  depend 
on  f(x4 ,/J*).  In  some  settings,  one  may  not  postulate  a  form  for  the  p  for 
estimating  9;  the  method  of  Rodbard  and  Frazier  (1975),  for  example,  uses  s^  in 
place  of  di(/J<)  as  in  Section  3.2  and  replaces  f(x^,/J, )  by  the  sample  mean  . 
We  now  consider  the  effect  of  replacing  predicted  values  by  sample  means  for 
the  general  class  (3.1). 

The  presence  of  the  sample  means  in  the  variance  function  in  (3  i ' 
requires  more  complicated  and  restrictive  assumptions  than  the  usual  large 
sample  asymptotics  applied  heretofore.  The  method  of  Rodbard  and  Frazier  and 
the  general  method  (3.1)  with  sample  means  are  functional  nonlinear  errors  in 


variables  problems  as  studied  by  Wolter  and  Puller  (1982)  and  Stefanski  and 

Carroll  (1985).  Standard  asymptotics  for  these  problems  correspond  to  letting 

-1/2 

a  go  to  zero  at  rate  N  In  Section  3.4  Me  discuss  the  practical 

Implications  of  o  being  small;  for  now,  we  state  the  following  result. 

Theorem _ 3.3.  ■  Suppose  that  we  replace  t(xi.fl9)  by  In  H2  r  H3  i  and  H4  i 

In  Theorems  3.1  and  3.2  and  adopt  the  assumptions  of  those  theorems.  Further, 
suppose  that  as  N  -»  «®,  a  -*  0  simultaneously  and 

(i)  N  1/2o  -♦  A ,  0  <  A  < 

(ii)  N1,/22^=1  C|  has  a  nontrivial  asymptotic  normal  limit  distribution; 

(iii)  The  (t^)  are  symmetric  and  i.i.d  ; 

_  p 

(iv)  { | ^  |  /  a)  has  uniformly  bounded  k  moments,  some  k  >  2. 

Then  the  results  of  Theorems  3.1  and  3.2  hold  with  B.  „  =*  B.  „  s  o. 

2  ,  N  3  ,  N 

□ 

This  result  shows  that  under  certain  restrictive  assumptions,  one  may 
replace  predicted  values  by  sample  means  ur  der  replication;  however,  it  is 
important  to  realize  that  the  assumption  of  small  a  is  not  generally  valid  and 
hence  the  use  of  sample  means  may  be  disadvantageous  in  situations  where  these 
asymptotics  do  not  apply. 

The  estimator  of  Raab  (1981)  discussed  in  Section  2.2  is  also  a  functional 
nonlinear  errors  in  variables  estimator,  complicated  by  a  parameter  space  with 
size  of  order  N.  Sadler  and  Smith  (1985)  have  observed  that  the  Raab  estimator 
is  often  indistinguishable  from  the  same  estimator  with  fj ^  replaced  by  In 

(2.5);  such  an  estimator  is  contained  in  the  general  class  (3.1).  Davidian 
(1986)  has  shown  that  under  the  asymptotics  of  Theorem  3.3  and  additional 
regularity  conditions  that  the  two  estimators  are  asymptotically  equivalent  in 


an  important  special  case.  We  may  thus  consider  the  result  of  Theorem  3.3 


relevant  to  this  estimator. 

» 

! 

j  3.4  Small  a  asymptotics 

I 

) 

j  In  Section  3.3  technical  considerations  forced  us  to  pursue  an  asymptotic 

>  theory  in  which  a  is  saall.  It  turns  out  that  in  some  situations  of  practical 

importance  these  asymptotics  are  relevant.  In  particular,  in  assay  data  we 

f 

|  have  observed  values  for  a  which  are  quite  saall  relative  to  the  means.  Such 

asymptotics  are  used  in  the  study  of  data  transformations  in  regression.  It  is 
thus  worthwhile  to  consider  the  effect  of  saall  a  on  the  results  of  Sections 
3.1  and  3.2  and  to  comment  on  some  other  implications  of  letting  o  -*  0. 

In  the  situation  of  Theorem  3.1,  if  the  errors  are  symmetrically 

distributed,  then  for  the  estimators  of  Section  2.1,  if  o  -»  o  as  N  -*  <»,  then 

there  is  no  effect  for  estimating  the  regression  parameter  p.  In  the  situation 
of  Theorem  3.2,  the  errors  need  not  even  be  symmetrically  distributed.  The 
major  insight  provided  by  these  results  is  that  in  certain  practical  situations 

in  which  o  is  saall,  the  choice  of  p0  may  not  be  too  important  even  if  the 

variance  function  depends  on  p. 

Saall  o  asymptotics  may  be  used  also  to  provide  insight  into  the  behavior 
of  other  estimators  for  9  which  do  not  fit  into  the  general  framework  of  (3.1). 
Davidlan  (1986)  has  shown  that  for  fixed  a  the  extended  quasi-likelihood 
estimator  of  9  of  Nelder  and  Pregibon  (1986)  and  McCullagh  and  Nelder  (1983) 
need  not  be  consistent.  If  one  adopts  the  asymptotics  of  the  previous  section, 
however,  it  is  easily  shown  that  the  extended  quasi-likelihood  estimator  is 


asymptotically  equivalent  to  regression  estimators  based  on  squared  residuals. 


In  Section  3  we  constructed  an  asyaptotic  theory  which  and  stateed  sone 
general  characteristics  of  regression-type  estimators  of  9 .  In  this  section  we 
use  the  theory  to  exhibit  the  specific  forms  for  the  various  estimators  of 
Section  2  and  coapare  and  contrast  their  properties.  Throughout,  define 

v(i,p,8)  -  log  g(z1(/8.9), 

and  let  v  (i,/J,9)  and  v  (i,/J,9)  be  the  column  vectors  of  partial  derivatives  of 

9  P 

v  with  respect  to  9  and  p.  Further,  let  ((/3,9)  be  the  covariance  matrix  of 

vfl  ( i  ,/J ,9  ) .  For  simplicity,  assume  that  the  errors  {e.^}  are  independent  and 

identically  distributed  with  kurtosis  ic;  k  *»  0  for  normality. 

4.1  Pseudo- likelihood,  restricted  maximum  likelihood  and  weighted 
squared  residuals. 

1/2 

If  when  accounting  for  the  effect  of  leverage  we  let  h  -*  0  such  that  N  h 
-♦  0,  then  these  methods  are  asymptotically  equivalent.  Writing  q  =  log  a,  we 
have  HJ(x)  -  x2.  Hg  {  =  exp(2q)  g2(z l,p,9),  H3  [  =  H2  { 

and  E  [  H^d^/J)}  signffcj)  ]  =  2  E  [  -  f  ( x±  ,/3 )  ]  =  0  so  that  B3  N  3  0 

regardless  of  the  underlying  distributions.  If  g  does  not  depend  on  /!,  or  o  -* 

*  -1/2 

0.  then  as  long  as  p+-p  =  0p{oN  ),  9  is  asymptotically  normally  distributed 
with  mean  9  and  covariance  matrix 

(4.1)  (2  +  x)  {4N  {(/5.9))"1. 

As  mentioned  in  Section  3,  under  the  small  o  asymptotics  of  Theorem  3.3, 
the  extended  quasi-likelihood  estimator  of  9  is  asymptotically  equivalent  to 
the  estimators  here  with  asymptotic  covariance  matrix  (4.1).  It  has  been  shown 


by  Davidlan  (1986)  that  these  methods  are  asymptotically  equivalent  to  maximum 


likelihood  for  general  underlying  distributions,  so  that  pseudo-likelihood, 


weighted  squared  residuals,  restricted  maximum  likelihood,  maximum  likelihood 
and,  if  a  0,  extended  quasi-likelihood,  are  all  asyaptotically  equivalent. 
In  addition,  all  of  these  estiaators  have  Influence  functions  which  are  linear 
in  the  squared  errors,  indicating  substantial  nonrobustness. 

We  aay  also  observe  that  these  aethods  are  preferable  to  unweighted 
regression  on  squared  residuals.  Write  (4.1)  as 

(4.2)  (1/2  +  K/4)  (VW_1V)-1, 

where  W  is  the  N  x  N  diagonal  matrix  with  elements  H  and  V  is  the  N  x  p 

O  t  1 

matrix  with  i^*1  row  ^ .  For  the  unweighted  estimator  based  on  squared 
residuals,  calculations  similar  to  those  above  show  that  the  asymptotic 
covariance  matrix  when  either  g  does  not  depend  on  p  or  o  ^  0  is  given  by 

(4.3)  (1/2  +  ft/4)  (VtV)"1(VtWV)(VtV)_1. 

The  comparison  between  (4.2)  and  (4.3)  is  simply  that  of  the  Gauss-Markov 
theorem,  so  that  (4.2)  is  no  larger  than  (4.3). 


We  do  not  consider  deletion  of  the  few  smallest  absolute  residuals.  Here 
Hj(x)  =  log  x  so  that  H^(x)  =  x  *.  Letting  rj  =  log  a  and  assuming  independent 
and  identically  distributed  errors  we  have  ^  =  q  +  u(i,/3,0)  +  E  log  le1  . 
H3  ^  s  l,  and  ^  =  r(i,/3,0).  Under  the  assumption  of  symmetry  of  the  errors, 
with  g  not  depending  on  p  or  a  -#  0,  tedious  algebra  shows  that  6  is 
asymptotically  normally  distributed  with  mean  0  and  covariance  matrix 


!,  iJVIVIS. 


(4.4) 


var  {log  |*|2}  {4N  $(/i,9)}  1 . 


The  influence  function  for  this  estimator  is  linear  in  the  logarithm  of  the 


absolute  errors.  This  indicates  nonrobustness  nore  for  inliers  than  for 


ourliers,  which  at  the  very  least  is  an  unusual  phenomenon .  If  the  errors  are 


not  syaaaetric  then  there  will  be  an  additional  effect  due  to  estimating  ,0  not 


present  for  the  methods  of  Section  4.1,  even  if  g  does  not  depend  on  p. 


Assume  that  the  errors  are  independent  and  identically  distributed  and  let 


exp(/7)  =  oE|t|.  Consider  the  weighted  estimator.  We  have  H^x)  =  x,  H^x) 


1.  H2  i  =  exp (q )  g(z1</J,9)  and  Hg  i 


Thus,  if  the  errors  are 


symmetrically  distributed  and  either  g  does  not  depend  on  p  or  a  -*  0,  9  is 


asymptotically  normally  distributed  with  mean  9  and  covariance  matrix 


(4.5) 


(0/(1  -  0))  {N  $(P,9))  \ 


where  0  =  var  |t|.  The  influence  function  for  this  estimator  is  linear  in  the 


absolute  errors.  By  an  argument  similar  to  that  at  the  end  of  Section  4.1,  we 


may  conclude  that  when  the  effect  of  pn  is  negligible  one  should  use  a  weighted 


estimator  and  iterate  the  method. 


We  assume  that  the  errors  are  symmetric  and  independent  and  identically 


distributed  and  that  either  g  does  not  depend  on  p  or  a  is  small.  Using  (4.1), 
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(4.4)  and  (4.5),  the  asymptotic  relative  efficiency  (ARE)  of  the  three  methods 
depends  only  on  the  distribution  of  the  errors.  The  ARE  of  the  weighted 
absolute  residual  method  to  pleudo-likelihood  is  the  same  as  the  asymptotic 
relative  efficiency  of  the  mean  absolute  deviation  with  respect  ot  the  sample 
variance  for  a  single  sample,  see  Huber  (1981,  page  3).  For  normal  errors, 
using  absolute  residuals  results  in  a  12*  loss  in  efficiency  while  for  standard 
double  exponential  errors  there  is  a  25*  gain  in  efficiency  for  using  absolute 
residuals.  For  normal  errors,  the  logarithm  method  represents  a  59*  loss  of 
efficiency  with  respect  to  pseudo-likelihood. 

In  Table  1  we  present  ARE's  for  various  contaminated  normal  distributions. 
The  table  shows  that  while  at  normality  neither  the  absolute  residuals  nor  the 
logarithm  methods  are  efficient,  a  very  slight  fraction  of  "bad"  observations 
is  enough  to  offset  the  superiority  of  squared  residuals  in  a  dramatic  fashion. 
For  example,  just  two  bad  observations  in  1000  negate  the  superiority  of 
squared  residuals.  If  1*  or  5*  of  the  data  are  "bad,"  absolute  residuals  and 
the  logarithm  method,  respectively,  show  substantial  gains  over  squared 
residuals.  The  implication  is  that  while  it  is  commonly  perceived  that  methods 
based  on  squared  residuals  are  to  be  preferred  in  general,  these  methods  can  be 
highly  non-robust.  Our  formulation  includes  this  result  for  maximum 
likelihood,  showing  its  inadequacy  under  slight  departures  from  the  assumed 
distributional  structure. 

4.5  Methods  based  on  sample  standard  deviations 

Assume  that  m  >  2  replicate  observations  are  available  at  each  design 
point.  In  practice,  m  is  usually  small  ,  see  Raab  (1981).  We  compsr:-  using 
absolute  residuals  to  using  sample  standard  deviations  in  the  estimators  of 


Section  2.1. 


For  simplicity,  assume  that  the  errors  are  independent  and 


identically  and  symmetrically  distributed  and  that  either  g  does  not  depend  on 

fi  or  o  is  small.  If  the  errors  are  not  symmetric  and  a  is  not  small  or  the 

variance  depends  on  fl,  using  sample  standard  deviations  presumably  will  be  more 

efficient  than  in  the  discussion  below.  This  issue  deserves  further  attention. 
2 

Let  s  be  the  sample  variance  of  m  errors  {t  . «.  }.  It  is  easily  shown 

I  1  n 

by  calculations  analagous  to  those  of  section  4.1  that  replacing  absolute 
residuals  by  sample  standard  deviations  has  the  effect  of  changing  the 
asymptotic  covariance  matrices  (4.1).  (4.4)  and  (4.5)  to 

(4.6)  Pseudo-likelihood  :  {(2  +  k  )  +  2/(m  -  1)}  (4N  £  (/J  .0  )  }-1  ; 

(4.7)  Logarithm  method  :  m  var  {  log  (s2)  }  (4N  $(p,0)}  1  ; 

m 

(4.8)  Weighted  absolute  residuals  :  (m  6m  /  (1  -  O^)}  (N  {  (/),?)}  1 , 

where  O  =  var  (s  ).  Table  2  contains  the  asymptotic  relative  efficiencies  of 

m 

using  sample  standard  deviations  to  using  transformations  of  absolute  residuals 

for  various  values  of  m  when  the  errors  are  standard  normal.  The  values  in  the 

2 

table  for  H^x)  =  x  and  x  indicate  that  if  the  data  are  approximately  normally 
distributed,  using  sample  standard  deviations  can  entail  a  loss  in  efficiency 
with  respect  to  using  residuals  if  m  is  small.  For  substantial  replication  (m 
>  10),  using  sample  standard  deviations  produces  a  slight  edge  in  efficiency 

with  respect  to  weighted  absolute  residuals  for  =  x. 

The  second  column  of  Table  2  shows  that  for  the  logarithm  method,  using 
sample  standard  deviations  surpasses  using  residuals  in  terms  of  efficiency 
except  when  m  =  2  and  is  more  than  twice  as  efficient  for  large  m.  In  its  raw 
form,  log  jr^j  is  very  unstable  because,  at  least  occasionally,  |rv|  *■  ' 
producing  a  wild  "outlier"  in  the  regression.  The  effect  of  usirj  ac,;«p/o 
standard  deviations  is  to  decrease  the  possibility  of  such  inliers;  the  sample 
standard  deviations  will  be  likely  more  uniform,  especially  as  m  increases 


v% 


The  implication  is  that  the  logarithm  aethod  should  not  be  based  on  residuals 
unless  remedial  measures  are  taken.  The  suggestion  to  trim  a  few  of  the 
smallest  absolute  residuals  before  using  this  method  is  clearly  supported  by 
the  theory;  presumably,  such  trimming  would  reduce  or  negate  the  theoretical 
superiority  of  using  sample  standard  deviations. 

Table  3  contains  the  asymptotic  relative  efficiencies  of  weighted  squared 
sample  standard  deviations  and  logarithms  of  these  to  weighted  squared 
residuals  under  normality  of  the  errors.  The  first  column  is  the  efficiency  of 
Raab's  method  to  pseudo-likelihood,  and  the  second  column  is  the  efficiency  of 
the  Rodbard  and  Frazier  method  to  pseudo-likelihood.  The  results  of  the  table 
imply  that  using  the  Raab  and  Rodbard  and  Frazier  methods,  which  are  popular  in 
the  analysis  of  radioimmunoassay  data,  can  entail  a  loss  of  efficiency  when 
compared  to  methods  based  on  weighted  squared  residuals.  Davidian  (1986)  has 
shown  that  the  Rodbard  and  Frazier  estimator  can  have  a  slight  edge  in 
efficiency  over  the  weighted  squared  residuals  methods  for  some  highly 
contaminated  normal  distributions.  Using  (4.6),  the  squared  residual  methods 
will  be  more  efficient  than  Raab's  method  in  the  limit.  Table  3  also  addresses 
the  open  question  as  to  whether  Raab's  method  is  asymptotically  more  efficient 
that  the  Rodbard  and  Frazier  method  for  normally  distributed  data.  The  answer 
is  a  general  yes,  thus  agreeing  with  the  Monte-Carlo  evidence  available  when 
the  variance  is  a  power  of  the  mean. 


In  Section  3  we  constructed  a  general  theory  of  regression-type  estimation 
for  9  in  the  heteroscedastic  model  (1.1).  This  theory  includes  a;;  special 
cases  common  methods  described  in  Section  2  and  allows  for  the  regression  to  be 
based  on  absolute  residuals  from  the  current  regression  fit  as  well  as  sample 


standard  deviations  in  the  event  of  replication  at  each  design  point.  Under 
various  restrictions  such  as  symmetry  or  small  a,  when  the  variance  function  g 
does  not  depend  on  p,  we  showed  in  Sections  3  and  4  that  we  can  draw  general 
conclusions  about  this  class  of  estimators  as  well  as  make  comparisons  among 
the  various  methods. 

When  employing  methods  based  on  residuals,  one  should  weight  the  residuals 
appropriately  and  iterate  the  process.  There  can  be  large  relative  differences 
among  the  methods  in  terms  of  efficiency.  Under  symmetry  of  the  errors, 
squared  residuals  are  preferable  for  approximately  normally  distributed  data, 
but  this  preference  is  tenuous,  these  can  be  highly  non-robust  under  only 
slight  departures  from  normality;  methods  based  on  logarithms  or  the  absolute 
residuals  themselves  exhibit  relatively  more  robust  behavior.  For  the  small 
amount  of  replication  found  in  practice,  using  sample  standard  deviations 
rather  than  residuals  can  entail  a  loss  in  efficiency  if  estimation  is  based  on 
the  squares  of  these  quantities  or  the  quantities  themselves.  For  the 
logarithm  method  based  on  residuals,  trimming  the  smallest  few  absolute 
residuals  is  essential,  since  for  normal  data  using  sample  standard  deviations 
is  almost  always  more  efficient  than  using  residuals,  even  for  a  small  number 
of  replicates.  Popular  methods  applications  such  as  radioimmunoassay  based  on 
sample  means  and  sample  standard  deviations  can  be  less  efficient  than  methods 
based  on  weighted  squared  residuals. 

Efficient  variance  function  estimation  in  heteroscedastic  regression 
analysis  is  an  important  problem  in  its  own  right.  There  are  important 
differences  in  estimators  for  variance  when  it  is  modeled  parametrically. 
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APPENDIX  A.  PROOFS  OF  MAJOR  RESULTS 


We  now  present  sketches  of  the  proofs  of  the  theorems  of  Section  3.  Our 
exposition  is  brief  and  nonrigorous  as  our  goal  is  to  provide  general  insights. 
In  what  follows,  we  assume  that 


(A.  1 ) 


,1/2 


17  -  n 

9-9 


V1*: 


under  sufficient  regularity  conditions  it  is  possible  to  prove  (A.l).  Such  a 

proof  would  be  long,  detailed  and  essentially  nonlnformative;  see  Carroll  and  , 

< 

1/2 

Ruppert  (1982a)  for  a  proof  of  N  consistency  in  a  special  case.  , 


Sketch  of  proof  of  Theorem  3.1:  From  (3.1),  a  Taylor  series,  the  fact  that  E  [ 
Hi {di (p ) }  ]  =  H2  i  and  laws  of  large  numbers,  we  have 


(A. 2) 


N '1/2Z* 


1  (H4.i/H3,i)lHl{di(/,*)}  "  H2,i('7’®’/,*)] 


Opd) 


By  the  arguments  of  Ruppert  and  Carroll  (1980)  or  Carroll  and  Ruppert  (1982a), 


'  *»  v 


tia  k'lktb'lb'Uh 


-1^  ac-aW  ik  iwtii 


y.a 


,-l/2_N 


N  (H4  j/Hg  j)  -  d^fi)}  +  a  11) 


-  b3(N  N  1/£(P9-P)  ♦•(!>. 


Applying  this  result  to  (A. 2)  along  with  a  Taylor  series  in  H0  .  gives 

2*  1 


°.n-1/2x^i  c4  mb2iM  +  b3iII)  n  1/2Cp,-P) 


-bi.nn 


n  -  n 


0-6 


♦  V1*- 


which  is  (3.2) . 


Theorew  3.2  follows  by  a  similar  argument;  in  this  case  the  representation 
(A. 3)  is  unnecessary. 


Sketch  of  proof  of  Theorew  3.3:  We  consider  Theorem  3.2;  the  proof  for  Theorem 
3.1  is  similar.  Recall  here  that  (2.4)  holds.  In  the  following,  all 
derivatives  are  with  respect  to  the  mean  p^  and  the  definitions  of  Cj  and  H2  i 
are  as  in  (3.4) . 

Assumption  (iv)  implies  that  |Y^  -  p^\  — o  so  that  a  Taylor 


series  in  q ,  0  and  gives 


"l.»  1,1/2  '  S'1/2i 1.1  C1  -N'1/2l?.1("2.lH4.l/H3.1»,,l.  -  "i> 


^  —1/2  «= 

Since  Yi<  -  p^  ■  a  g(p  ^,z  ^,0  )  AN  g(P  ,  where  is  the  mean 


V  \  \  *,  „  -V 


y  .•\-vv\.v  /*  v  • 

_N  VA  A  \  >  A\V  •/ 


v  -V*  '•  >  V  -•*  a  ^  ^  , 

*->  *.*  ■  »•  *  «  *  *  "j  "  m  *  A  i 
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of  the  errors  at  x^ ,  we  can  write  the  last  two  terms  on  the  right-hand  side  of 
(A. 4)  as 


(q 


1,1 


qi,2Ci) 


for  constants  (q.  }.  By  assuaption  (v),  since*,  has  aean  zero,  (A. 5) 

1  ,  J  1  * 

converges  in  probability  to  zero  if  Efe^Cj)  =  0,  which  holds  under  the 

assuaption  of  syaaetry.  Thus,  (A. 5)  converges  to  zero  which  froa  (A. 4) 

coapletes  the  proof.  Note  that  if  we  drop  the  assuaption  of  syaaetry,  froa 

1/2  “ 

(A. 5)  the  asyaptotic  normal  distribution  of  N  (0-0)  will  have  aean 
1  *  1  S.'l'l.i  "• 

Nh** 

APPENDIX  B.  CHARACTERIZATION  OP  RESTRICTED  MAXIMUM  LIKELIHOOD 


Let  pn  be  a  generalized  least  squares  estimator  for  p.  Assume  first  that 
g  does  not  depend  on  p.  Let  the  prior  distribution  for  the  parameters  n(p,9  ,o) 
be  proportional  to  o  1 .  The  marginal  posterior  for  9  is  hard  to  compute  in 
closed  fora  for  nonlinear  regression.  Following  Box  and  Hill  (1974)  and  Beal 
and  Sheiner  (1986),  we  have  the  linear  approximation 

f(xr/3)  -  f(xr/J,)  ♦  tp(xi,pm)t(p-pj. 

Replacing  f(x^,p)  by  its  linear  expansion,  the  marginal  posterior  for  9  is 
proportional  to 


P(» ) 


(WN  2, ...-1/2 


Tn^pT 


(9)  (Det  Sr(0)> 


ITT 


(B.  1) 


,  where 


o"(«)  -  (N-p)  Zj.jr-  /  Z  (zrP,.«). 


SG(«)  -  M-1zJ.1f^(x1.i.)f^(x1.i*)t  /  g2(zr/J,.®) 


and  where  Oet  A  -  determinant  of  A.  If  the  variances  depend  on  p,  we  extend 


the  Bayesian  arguments  by  replacing  g.(0)  by  gfZj./J,,*). 

Let  H  be  the  hat  matrix  H  evaluated  at  pm  and  let  h^  -  h^t/S^.fl).  Prom 


(2.1),  pseudo-likelihood  solves  in  (6 ,a) 


(B.2)  zj=1  [rJ/loVlZji,.*)}] 


wtf(zi./J,,e) 


«V*i  •*•••> 


Since  H  is  idempotent,  the  left  hand  side  of  (B.2)  has  approximate  expectation 


(B.3) 


1  -  P/N 


u$(z i,P».9)  (1-  h1A) 


To  modify  pseudo- likelihood  to  account  for  loss  of  degrees  of  freedom,  equate 


the  left  hand  side  of  (B.2)  to  (B.3).  Prom  matrix  computations  as  in  Nel 


(1980),  this  can  be  shown  to  be  equivalent  to  restricted  maximum  likelihood. 
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Table  1 

Asymptotic  relative  efficiency  with  respect  to  weighted  squared  residuals 
for  contaminated  normal  distributions  with  distribution  function  F(x)  -  (1  - 
a)#(x)  +  a*(x/3) . 


contamination 
fraction  a 


0.000 

0.001 

0.002 

0.010 

0.050 


weighted  absolute 
residuals 


0.876 

0.948 

1.016 

1.439 

2.035 

Table  2 


logarithms  of 
absolute  residuals 


0.405 

0.440 

0.480 

0.720 

1.220 


Asymptotic  relative  efficiency  of  using  sample  standard  deviations  to  using 
absolute  residuals  under  normality  for  H^x)  (weighted  methods). 
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CM 

* 

log  X 

X 

2 

0.500 

0.500 

0.500 

3 

0.667 

1.000 

0.696 

4 

0.750 

1.320 

0.801 

• 

• 

• 

• 

• 

• 

• 

• 

• 

• 

9 

0.889 

1.932 

0.986 

10 

0.900 

1.984 

1.001 

00 

1.000 

2.467 

1.142 

■/  / , 


Table  3 


Asymptotic  relative  efficiency  of  using  sample  standard  deviations  to 
weighted  squared  residuals  under  normal  errors  for  H  (x) . 


A 

2 

2 

log  X 

2 

0.500 

0.203 

3 

0.667 

0.405 

4 

0.750 

0.535 

5 

0.800 

0.620 

6 

0.833 

0.680 

7 

0.857 

0.723 

8 

0.875 

0.757 

9 

0.889 

0.783 

10 

0.900 

0.804 
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